首页> 外文OA文献 >Unsupervised Two-Way Clustering of Metagenomic Sequences
【2h】

Unsupervised Two-Way Clustering of Metagenomic Sequences

机译:元基因组序列的无监督双向聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. The efficacy of clustering methods depends on the number of reads in the dataset, the read length and relative abundances of source genomes in the microbial community. In this paper, we formulate an unsupervised naive Bayes multispecies, multidimensional mixture model for reads from a metagenome. We use the proposed model to cluster metagenomic reads by their species of origin and to characterize the abundance of each species. We model the distribution of word counts along a genome as a Gaussian for shorter, frequent words and as a Poisson for longer words that are rare. We employ either a mixture of Gaussians or mixture of Poissons to model reads within each bin. Further, we handle the high-dimensionality and sparsity associated with the data, by grouping the set of words comprising the reads, resulting in a two-way mixture model. Finally, we demonstrate the accuracy and applicability of this method on simulated and real metagenomes. Our method can accurately cluster reads as short as 100 bps and is robust to varying abundances, divergences and read lengths.
机译:宏基因组学面临的主要挑战是开发用于表征大量短基因组读段的功能和分类学内容的工具。聚类方法的功效取决于数据集中读取的数量,读取的长度以及微生物群落中源基因组的相对丰度。在本文中,我们为从元基因组中读取的数据建立了无监督的朴素贝叶斯多物种,多维混合模型。我们使用提出的模型将宏基因组读物按其起源物种聚类,并表征每种物种的丰度。我们将整个基因组中单词计数的分布建模为高斯(短而频繁的单词)和泊松(长于稀有单词)。我们采用高斯混合或泊松混合来模拟每个仓中的读数。此外,我们通过对包含读段的一组单词进行分组来处理与数据相关的高维和稀疏性,从而形成双向混合模型。最后,我们证明了该方法在模拟和真实的基因组上的准确性和适用性。我们的方法可以准确地将短至100 bps的读取聚类,并且对于变化的丰度,差异和读取长度具有鲁棒性。

著录项

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号